Environmental Health Perspectives — Latest Matching Preprints

1

Consumer-Product Chemical Mixture and Systemic Inflammation: Survey-Weighted Analysis of Seven Urinary Biomarkers in NHANES 2005-2010

Jobe, N. I.

2026-06-10 occupational and environmental health 10.64898/2026.06.08.26355076 medRxiv

Top 0.1%

8.7%

Show abstract

Background: Endocrine-disrupting chemicals (EDCs) in consumer products are ubiquitously detected in human biospecimens, yet most epidemiological studies examine single chemicals rather than real-world co-exposures. We evaluated associations between a mixture of seven urinary chemical biomarkers and systemic inflammation. Methods: Survey-weighted log-log regression models adjusted for age, sex, race/ethnicity, poverty-income ratio, and survey cycle were conducted with Benjamini-Hochberg FDR correction (primary analysis, N=4,864). A sensitivity analysis additionally adjusted for body mass index and smoking status (N=4,494). Results: In the primary analysis, 5 of 7 chemicals showed significant associations after FDR correction: ethylparaben ({beta} = -0.056, FDR P < .001), propylparaben ({beta} = -0.026, FDR P = .007), bisphenol A ({beta} = +0.052, FDR P = .005), monoethyl phthalate ({beta} = +0.043, FDR P = .002), and monocyclohexyl phthalate ({beta} = +0.215, FDR P = .007). The WQS mixture index was significantly associated with CRP ({beta} = +0.056, 95% CI [0.031, 0.081], P < .001), with monocyclohexyl phthalate carrying the largest mixture weight (0.342). In the BMI- and smoking-adjusted sensitivity analysis, associations attenuated to null for all chemicals, though MCP preserved direction ({beta} = +0.129) and the WQS mixture direction was maintained ({beta} = +0.018). Two multiple imputation sensitivity analyses confirmed that monocyclohexyl phthalate was the only chemical to maintain a positive direction across all four analytical specifications (primary complete-case, BMI-adjusted complete-case, primary-aligned imputation, and BMI-adjusted imputation), reaching statistical significance in three of four specifications and providing convergent evidence of a robust MCP-inflammation association. Conclusions: The chemical mixture showed a significant collective association with systemic inflammation, consistent with a cumulative pro-inflammatory burden from co-exposure to multiple consumer product chemicals. These findings suggest that regulatory approaches should shift from single-chemical to mixture-based risk assessment frameworks for consumer product safety.

2

Multi-Pathogen Wastewater Surveillance enables Real-Time Targeted Public Health Interventions During the 2025 African Nations Championship Football Tournament

Nsawotebba, A.; Morunyanga, I.; Nakintu, V.; Kabazzi, J.; Magala, J.; Uragiwenimana, V.; Ssekyondwa, S.; Kasujja, R.; Onywera, H.; Hull, N.; Akejo, D. S.; Dambya, C.; Ikoba, S.; Baraka, V.; Tebeje, Y. K.; Barigye, E.; Cham, F.; Ssewanyana, I.; Nabaasa, H.; Muruta, A.; Olaro, C.; Atwine, D.; Nabadda, S.; Acheng, J. R.

2026-06-08 occupational and environmental health 10.64898/2026.06.05.26354973 medRxiv

Top 0.2%

2.4%

Show abstract

Mass gatherings pose significant public health risks by facilitating the spread of infectious diseases. While wastewater-based surveillance (WBS) has been widely used to monitor pathogens in high-income settings, its use as a practical, multi-pathogen surveillance tool during mass gatherings in low- and middle-income countries remains limited. This study aimed to assess the operational feasibility, epidemiological significance, and public health utility of multi-pathogen WBS during the African Nations Championship (CHAN) football tournament in Uganda. Wastewater surveillance was conducted at Mandela National Stadium during eight match days in August 2025. Moore swabs were deployed at 38 manholes receiving wastewater from different toilet facilities across the stadium to capture representative wastewater samples. Samples were processed using Nanotrap(R) microbiome virus particles to concentrate pathogens, followed by nucleic acid extraction. Samples were analyzed for multiple enteric and respiratory pathogens, including Mpox, using quantitative PCR (qPCR). Descriptive analyses were performed to characterize pathogen detection patterns, positivity rates, and temporal distribution across surveillance sites. A total of 304 wastewater samples were collected and analyzed, of which 259 (85.2%) tested positive for at least one pathogen. Multiple pathogens were consistently detected across sampling days, with enteric pathogens predominating, particularly Shigella spp. (53.6%), Rotavirus A (35.9%) and Enterovirus (32.2%). The mpox virus was also detected in a notable proportion of samples (28.6%) across several sampling days. Respiratory pathogens, including SARS-CoV-2 (11.8%) and Influenza B (8.2%), were identified intermittently at lower frequencies. Pathogen diversity varied over time, with up to eight pathogens detected on a single day, and co-detection of multiple pathogens observed in the majority of positive samples. Cq value distributions further demonstrated variability in detected signal patterns across pathogens. Surveillance findings informed real-time public health interventions, including sanitation reinforcement, intensified hygiene promotion, environmental disinfection, and targeted risk communication, strengthened syndromic surveillance with on-site triage, and targeted environmental health assessments of food handling and wastewater infrastructure. These findings demonstrate the operational feasibility and public health utility of integrating multi-pathogen wastewater-based surveillance into mass-gathering preparedness and response frameworks in low-resource settings. By capturing diverse pathogen signals and informing targeted interventions during the CHAN football tournament, WBS can provide actionable population-level insights that can support outbreak preparedness and response. Scaling WBS within national preparedness systems could strengthen epidemic intelligence, enhance early warning capacity, and support data-driven public health decision-making during future mass gatherings and emerging infectious disease threats.

3

Genomic wastewater surveillance of seasonal and zoonotic influenza A viruses in California during the 2024-2025 flu season

Wang, A. L.-W.; Lamtyugina, A.; Jiang, M.; Yu, A. T.; Lu, C.; Wadford, D.; Burnor, E.; Pipes, L.; Kantor, R.; Nelson, K. L.

2026-06-12 epidemiology 10.64898/2026.06.10.26355323 medRxiv

Top 0.5%

0.9%

Show abstract

Wastewater genomic surveillance provides an opportunity to detect human and animal influenza A virus (IAV). We aimed to implement an IAV genomic surveillance framework agnostic to subtype, which enables recovery of IAV from multiple hosts and estimation of proportions across subtypes. We conducted IAV genomic surveillance in wastewater during the 2024-2025 flu season at multiple sites in California and compared these data with available human clinical IAV sequences and test positivity. We applied a custom whole-genome, multi-host IAV probe enrichment panel and adapted our custom expectation-maximization (EM) algorithm to deconvolute IAV mixtures in wastewater and infer subtype relative abundances. Absolute IAV concentrations were quantified using RT-PCR-based assays. H5N1 wastewater and clinical sequences were further characterized by constructing a whole-genome maximum-likelihood phylogenetic tree. Finally, we performed variant analysis to examine amino acid substitutions detected in wastewater. Our IAV probe enrichment method and EM algorithm successfully enriched all eight segments of three circulating IAV subtypes and accurately estimated subclade relative abundances for mixed IAV samples. Seasonal human H1N1pdm09 and H3N2 were detected throughout the study period from both wastewater and clinical sequencing data, with H1N1 subclades 6B.1A.5a.2a.1 and 6B.1A.5a.2a co-circulating, and H3N2 dominated by subclade 3C.2a1b.2a.2a.3a.1. Wastewater surveillance consistently detected H5N1 clade 2.3.4.4b across three monitored wastewater sites, while clinical H5N1 detections, from anywhere in CA, were sporadic and rare. Whole-genome phylogenetic analysis revealed that wastewater H5N1 sequences clustered with reference sequences associated with dairy cow and avian infections, while all human clinical H5N1 sequences clustered exclusively with reference sequences associated with dairy cow infections. Amino acid substitutions were identified across viral segments, and no mutations associated with mammalian adaptation were observed from wastewater samples.

4

Long-term exposure to PM2.5 components and lipid profiles in WTC Health Program general responders

Krasnov, H.; knobel, p.; Hsiao-Hsien Hsu, L.; Teitelbaum, S.; Mclaughlin, M.; Just, A. C.; Kloog, I.; Yitshak Sade, M.

2026-06-11 epidemiology 10.64898/2026.06.10.26355272 medRxiv

Top 0.5%

0.7%

Show abstract

Fine particulate matter (PM2.5) was found to be associated with elevated blood lipids, but fewer studies have examined the associations with specific constituents of PM2.5. We studied the associations between exposure to annual PM2.5 and its 14 constituents, and repeated blood lipid measurements among general responders enrolled in the World Trade Center Health Program between 2003 and 2019 (n = 44,876). We used generalized additive mixed effect models to investigate the single-pollutant associations with repeated measures of blood total cholesterol (TC), high and low-density lipoprotein (HDL-C and LDL-C) levels. We then used linear generalized weighted quantile sum regression with a random intercept for participant ID to account for the clustering of repeated measures and evaluate the combined associations with the component mixture. A decile increase in the mixture of 14 PM2.5 chemical components was associated with 0.375 mg/dL increase in TC levels (95% confidence Interval (CI): 0.174-0.577) and 0.302 mg/dL increase in LDL-C (95% CI: 0.063, 0.540). Lead, organic carbon, and iron were major drivers of both associations. Component-specific models also show higher TC and LDL levels associated with interquartile range increases in organic carbon (0.472, 95% CI [0.027, 0.918] and 0.648 95% CI [0.136, 1.160]) and iron exposure (1.081, 95% CI [0.630, 1.532] and 0.748, 95% CI [0.318, 1.178]). In conclusion, we found PM2.5 exposure to be associated with elevated lipid levels. The associations differed by PM2.5 composition, highlighting organic carbon, lead, and iron and major drivers. These findings are highly significant for a population exposed to extreme air pollution event and susceptible to lipid alterations that might trigger cardiovascular events.

5

Global Health Injustice From Climate Change Driven By Consumption

Rupcic, L.; Yoo, D.; Levasseur, A.; Alexandre, C.; Laurent, A.; Jolliet, O.

2026-06-12 public and global health 10.64898/2026.06.10.26355381 medRxiv

Top 0.7%

0.5%

Show abstract

Climate change imposes unequal health burdens from heat and cold, disproportionately harming vulnerable nations least responsible for emissions. A framework to quantitatively attribute this damage to different countries' consumption patterns has been missing. We developed a global framework linking consumption-based greenhouse gas emissions to country-specific health burdens, measured in Disability-Adjusted Life Years (DALYs). Our results quantify the profound scale of this externalized harm. For example, average North American consumption imposes a global health burden of 34 days of healthy life per person per year, without net damage suffered. In contrast, Sub-Saharan Africa endures 25 days per person per year despite minimal emissions. The resulting Health Injustice Index provides a powerful instrument for climate accountability, reframing responsibility in terms of tangible human health impacts.

6

Bias from small-count suppression in county-level cancer disparity estimates: a calibrated simulation study

gahan, k.

2026-06-08 epidemiology 10.64898/2026.06.05.26355021 medRxiv

Top 0.8%

0.4%

Show abstract

Abstract Background. Area-level cancer disparities are routinely estimated from public county data in which rates based on small counts (fewer than 16 cases or deaths) are suppressed. Analysts typically drop suppressed counties (complete-case analysis). Because suppression depends on case counts tied to population size and demographic composition, this missingness may be informative, but its effect on the disparity estimate has not, to our knowledge, been quantified. Methods. In a cross-sectional ecological study of 3,143 U.S. counties (analytic sample 3,018 with computable exposure) using one frozen public release of NCI State Cancer Profiles incidence and mortality data and ACS 2018-2022 5-year data, we estimated the most- versus least-deprived ICE(race+income) quintile rate ratio (RR) and rate difference for female breast, stomach, and cervix cancers under four suppression-handling methods: complete-case, available-case, bounding, and model-based small-area estimation. We characterized which counties were erased, and, following the ADEMP framework, ran a Monte Carlo simulation (1,000 replicates per cell; Monte Carlo standard error of bias approximately 0.0025) calibrated to the release to measure bias against a known truth. Analyses were pre-registered. Results. The suppressed fraction rose with rarity: 7.4% of counties for breast, 61.3% for stomach, and 75.7% for cervix incidence. Suppression was concentrated in the most-deprived quintile (cervix, 81.8% suppressed vs 63.8% least-deprived) and overwhelmingly removed rural rather than minority residents (cervix: 81% of the rural but 9% of the minority population erased). For breast (little suppression) the RR was 0.87 (95% CI 0.85-0.89) and identical across methods; for cervix incidence the complete-case RR (1.56) exceeded the model-based estimate (1.50), and for cervix mortality (91% suppressed) complete-case (1.86) exceeded model-based (1.56) by 16% with a wide bounding interval (1.88-2.62). In calibrated simulation, population-weighted complete-case bias was small (less than 2%) at the observed deprivation-county-size correlation and grew with rarity, threshold, and unweighted aggregation; its direction was conditional, becoming positive (over-estimation) as deprived counties became smaller. Conclusions. Complete-case handling of suppressed counties over-estimates rare-cancer area disparities relative to methods that retain them, while silently erasing most of the rural and most-deprived communities the estimate is meant to represent. The effect is negligible for common cancers and grows with rarity. Public-data disparity analyses should report the suppressed fraction and use bounded or model-based estimates by default. Keywords: cancer disparities; small-count suppression; Index of Concentration at the Extremes; informative missingness; small-area estimation; rural health.

7

Metatranscriptomics-Derived Disease Risk Scores as a Preventive, Diagnostic, and Treatment Support Tool

Hu, L.; Bass, M.; Patridge, E.; Molusky, M.; Antoine, G.; Vuyisich, M.; Banavar, G.

2026-06-06 genetic and genomic medicine 10.64898/2026.05.29.26354333 medRxiv

Top 0.9%

0.3%

Show abstract

Background: Chronic diseases and symptom syndromes often develop after prolonged biological changes that may precede formal diagnosis. RNA-based metatranscriptomics captures active microbial and human gene expression and may provide a functional layer for disease risk evaluation. To address this translational gap, we developed and validated a Disease Risk Score (DRS) framework that integrates metatranscriptome-derived pathway activity scores from stool, saliva, and blood samples, and evaluated its potential clinical utility as an adjunct risk-evaluation tool. Methods: DRS uses disease-specific sets of pathway activity scores derived from stool and saliva microbial functions, stool and saliva microbial taxa, and blood human gene expression. For each disease, 'not optimal' pathway scores are aggregated into a normalized cumulative odds ratio, or cOR, using score-level odds ratios, statistical significance, and literature-supported biological relevance derived from a Development Cohort of 22,369 individuals. A cOR [≥] 5 is defined as high risk. Performance is evaluated in an independent Validation Cohort of 15,908 individuals using self-reported diseases as the reference. Disease support requires both significant cOR separation between self-reported and not-reported (Cohen's d [≥] 0.2) and risk ratio enrichment of self-reported disease among individuals classified as high risk (95% CI of Risk Ratio > 1). Results: Of 20 initially evaluated diseases, 15 meet the prespecified validation criteria on the independent validation cohort: ADHD, anxiety, chronic fatigue syndrome, depression, GERD, hypertension, inflammatory bowel disease, IBS-C, IBS-D, insomnia, MASLD, obesity, obstructive sleep apnea, Sjogren's syndrome, and type 2 diabetes. Five selected clinical scenarios illustrate how DRS can support clinician-mediated decision making, including IBS subtype reclassification, improved diagnostic acceptance in IBS-D, personalized lifestyle counseling in MASLD and early type 2 diabetes, and diagnostic uncertainty in atypical GERD. Conclusions: DRS is a metatranscriptomics-based risk-stratification framework that aggregates active microbial and human pathway signals into interpretable disease-specific risk estimates across a wide range of disease conditions. Validation against self-reported disease labels in an independent cohort shows significant risk enrichment for each of 15 diseases. DRS is intended as an adjunct to clinical evaluation: a decision support tool in situations where routine care encounters uncertainty, delay, or low patient engagement. Future prospective studies using clinically adjudicated endpoints are needed to assess calibration and clinical outcomes.

8

Seasonality, source type, and women's water labor: A longitudinal mixed-methods study in Kenya and Honduras

Mink, T.; Ogutu, E.; Patrick, M.; Sinharoy, S.; Bolanos Gamez, M. V.; Macler, A.; Ngo, C. P.; Oglesby, H.; Bendit, O.; White, J.; Antonio, S.; Ramos, G.; Roldan Medina Lopez, E.; Atandi, E.; Mwangi, P.; Koome, P.; Otieno Onyango, R.; Otuya, P. A.; Ruto, P.; Caruso, B. A.

2026-06-10 public and global health 10.64898/2026.06.09.26355008 medRxiv

Top 0.9%

0.3%

Show abstract

Women shoulder the majority of water collection labor globally, yet how their water collection and water-related work experiences may change over time or by water source type remains insufficiently understood. We conducted a longitudinal, mixed-methods study in rural Kenya and Honduras to understand how women's experiences collecting water and performing water-related work varied between (a) two time points, (b) improved and unimproved water source types, and (c) water source location. Data were collected in 2023 and 2024 using interviews, observation, GPS-enabled watches, and scales to measure time and distance traveled, water weight and volume carried, and calories expended. 133 women participated in data collection (66 Kenya, 67 Honduras). We compared women's experience data by time point (2023 vs. 2024), source type (improved vs. unimproved), and source location (off-premises vs. on-premises) (t-test, Mann-Whitney U test). We also mapped participants' routes and activities to show which sources were visited, when, and for what activities. In Kenya, mean water collection time, distance, and caloric expenditure were significantly lower and water volume was significantly higher in 2024 when there were unexpected rains compared to 2023 when there was a persistent drought. When comparing source types during the 2023 drought, journeys to improved sources took significantly less time and energy and covered less distance than journeys to unimproved sources. These differences were not observed during the rainy conditions of 2024 when unimproved sources were closer and more accessible. In Honduras, water collection and water work burdens did not differ significantly by time point or source type. We found women with on-premises water access to still expend considerable time and caloric expenditure engaging in water work within their household compounds. Findings from Kenya suggest that water infrastructure improvements can reduce women's water collection burdens, though benefits may depend on and vary by season and source location. Findings from Honduras show that water labor does not end once water is in the household. Rather, substantial time and energy are expended carrying out water-related work even when sources are on premises, suggesting that efforts to assess water labor need to extend beyond collection alone. To meaningfully reduce burdens and ensure improved water sources are utilized during all seasons, initiatives need to consider source location, seasonal variability, and work beyond collection. Evaluations to assess infrastructure impacts on women's labor and well-being are needed and long overdue.

9

Deconvolution-based cell-type specific DNA methylation-wide and transcriptome-wide association studies identify risk CpG sites and genes associated with colorectal cancer risk

Li, Q.; Xu, L.; Wang, J.; Li, C.; Wen, W.; Shu, X.; Yang, Y.; Shu, X.-o.; Cai, Q.; Long, J.; Singh, B.; Lau, K. S.; Yin, Z.; Casey, G.; Song, M.; Peters, U.; Zheng, W.; Guo, X.

2026-06-12 genetic and genomic medicine 10.64898/2026.06.11.26355460 medRxiv

Top 1%

0.3%

Show abstract

Bulk tissue-based DNA methylation-wide (MWAS) and transcriptome-wide association studies (TWAS) have identified CpG sites and genes associated with colorectal cancer (CRC) risk, but do not account for cellular heterogeneity. To address this, we developed a deconvolution-informed framework to infer cell-type specific DNA methylation and gene expression profiles from bulk normal colon tissues using reference single-cell epigenomic and transcriptomic datasets. We performed cell-type specific MWAS (ctMWAS) using deconvoluted DNA methylation data from 293 normal colon samples and conducted cell-type specific TWAS (ctTWAS) using deconvoluted gene expression data from 707 normal colon samples. Genetically predicted methylation and expression models were integrated with CRC GWAS summary statistics (78,473 cases and 107,143 controls) to identify risk-associated CpG sites and genes. Through ctMWAS, ctTWAS, and colocalization analyses, we identified 178 significant cell-type-specific CpG sites in 106 loci and 68 risk genes in 40 loci, including 26 previously unreported loci. Through additional integrative methylation-gene analysis, we prioritized 132 candidate risk genes, the majority of which were supported by multi-omics evidence and stage-specific dysregulation across the adenoma-carcinoma and serrated-carcinoma progression pathways. Pathway enrichment analyses implicated pathways involved in DNA double-strand break repair, TP53 regulation, TGF-{beta} signaling, and innate immune responses. Among prioritized genes, 14 were identified as putative druggable targets linked to 90 FDA-approved or clinical-stage drugs. Experimental validation supports an oncogenic role for SF3A3. These findings demonstrate that deconvolution-informed integrative analyses enable cell-type-resolved identification of epigenetic and transcriptional mechanisms underlying CRC susceptibility and provide insights into disease biology, prevention, and therapeutic target discovery.

10

Multi-ancestry genome-wide association study and meta-analysis of stimulant use disorder reveals biology and relationships to other psychiatric disorders

Beck, S. E.; Deak, J. D.; Levey, D. F.; Ge, T.; Jeffries, P. W.; Lai, D.; Mallard, T. T.; Degenhardt, L.; Lind, P. A.; Tollerup Nielsen, T.; Tubbs, J. D.; Wetherill, L.; Johnson, E. C.; Hatoum, A. S.; The SUD Working Group of the Psychiatric Genomics Consortium, ; COGA Collaborators, ; Yale-Penn Collaboration, ; The VA Million Veteran Program, ; Borglum, A.; Demontis, D.; Medland, S. E.; Martin, N. G.; Nelson, E. C.; Smoller, J. W.; Kranzler, H. R.; Gaziano, J. M.; Stein, M. B.; Agrawal, A.; Edenberg, H. J.; Gelernter, J.

2026-06-10 genetic and genomic medicine 10.64898/2026.06.05.26354997 medRxiv

Top 1%

0.3%

Show abstract

Stimulant use disorder (StimUD) is a significant public health problem, but genetic studies have been limited by small sample sizes. We conducted genome-wide association studies (GWAS) of StimUD in the Million Veteran Program (MVP) and All of Us (AOU), followed by meta-analysis with FinnGen and 10 additional datasets, for a total of 709,369 individuals (Ncases=33,977, Ncontrols=675,392) in four broad ancestry groups: European (EUR) (Ncases=22,564, Ncontrols=624,672), African (AFR) (Ncases=7,574, Ncontrols=34,189), Admixed American (AMR) (Ncases=3,657, Ncontrols=15,698), and East Asian (EAS) (Ncases=182, Ncontrols=833). Population-specific SNP heritability was 6.1% in EUR and 2.4% in AFR. We discovered a total of 19 genome-wide-significant loci, six in EUR, including DRD2*rs5794864, P=7.32E-10, one in AFR, five in a multi-ancestry meta-analysis, including CHRNA5*rs55781567, P=3.27E-9, two in a male-only meta-analysis, including FTO*rs8057044, P=9.50E10-9, and five in a meta-analysis of sex-stratified results. In a hold-out AOU subsample (NEUR=18,841, NAFR=12,263, NAMR=9,739), ancestry-specific polygenic risk scores were significantly associated with StimUD in EUR (OR=3.28, 95% confidence interval (CI)=2.89-3.71) and AMR (OR=2.01, 95% CI=1.71-2.37). Transcriptome-wide association studies, fine-mapping, and colocalization analyses prioritized additional genes (e.g., GPX1, BSN). Genetic correlation, Mendelian randomization, and causal mixture analyses revealed relationships with other substance use and use disorder phenotypes, including cannabis use disorder (rg=0.94, P=5.43E-237) and opioid use disorder (rg=1.01, P=4.40E-107), and other psychiatric traits, including anxiety, depression, neuroticism, and attention-deficit/hyperactivity disorder. This is the first well-powered GWAS of StimUD, and it offers significant insights into disease biology.

11

Population-scale detection of methylation outliers from long-read genome sequencing

Jensen, T. D.; Kaur, R.; Bonner, D. E.; Nguyen, J.; Reuter, C. M.; Undiagnosed Diseases Network, ; Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) Consortium, ; Ashley, E. A.; Bernstein, J. A.; Wheeler, M. T.; Montgomery, S. B.

2026-06-11 genetic and genomic medicine 10.64898/2026.06.09.26355279 medRxiv

Top 1%

0.3%

Show abstract

Background: Aberrant DNA methylation can mediate the functional effects of rare genetic variation and contribute to imprinting disorders, repeat expansion diseases, and other pathogenic regulatory mechanisms. Long-read sequencing technologies now enable genome-wide detection of CpG methylation alongside genetic variation from a single assay. However, methods for systematic identification and interpretation of methylation outliers from long-read sequencing data remain limited. Methods: We developed METAFORA, a computational workflow for detecting methylation outlier regions from PacBio and Oxford Nanopore long-read sequencing data. METAFORA constructs population-level methylation references, segments the genome into correlated CpG blocks, infers technical and biological sources of variation through hidden factor estimation, models uncertainty due to variable depth sequencing, and computes covariate-adjusted methylation outlier scores for individual samples. We applied METAFORA across large long-read sequencing cohorts and integrated methylation outliers with multi-omic data. METAFORA is implemented as a snakemake workflow available at https://github.com/tjense25/METAFORA. Results: METAFORA identified methylation outlier regions associated with rare structural variants, tandem repeat expansions, and imprinting abnormalities. We found outlier regions were enriched for molecular outliers across transcriptomic and chromatin accessibility datasets, supporting their functional relevance in gene regulation. In a representative case, METAFORA identified an imprinting defect affecting the GNAS locus associated with an STX16 deletion. Conclusions: METAFORA enables scalable detection and interpretation of methylation outliers from long-read sequencing data and provides a framework for integrating epigenetic outliers with genomic and multi-omic analyses. These approaches may improve interpretation of rare regulatory variation and support discovery of clinically relevant epigenetic abnormalities in genomic medicine.

12

Local Influenza Forecasts Outperform State-Level Forecasts in the United States

Kim, D.; Pasco, R.; Johnson, K. E.; Fox, S. J.; Reich, N. G.; Meyers, L. A.

2026-06-08 infectious diseases 10.64898/2026.06.04.26354836 medRxiv

Top 1%

0.3%

Show abstract

Accurate outbreak forecasts are critical for timely and effective public health response. In the United States, however, most forecasts are produced at the state level, which can mask substantial sub-state heterogeneity and limit their utility for local planning. We generated and evaluated forecasts of the percentage of Emergency Department visits attributable to influenza across 173 large metropolitan Health Service Areas (HSAs) using a gradient boosting quantile regression (GBQR) model, and compared their accuracy to forecasts derived from state-level data alone. At a one-week, two-week and three-week horizon, local forecasts outperformed state-based forecasts in 98.8%, 90.8%, and 78.6% of HSAs, respectively, achieving mean weighted interval scores that were on average a 39.2% lower (95% range: 5.9% to 76.7%), 19.6% lower (-6.3% to 59.5%) , and 11.4% lower (-11.7% to 44.9%), respectively. The performance advantage of local forecasting was strongest in HSAs representing a smaller share of their state's population and increased with the proportion of the HSA population living in urban areas and the number of metropolitan areas within a state. These results, based on an analysis of HSAs with populations greater than 250,000, demonstrate that fine-scale modeling can substantially improve forecast accuracy and highlight the potential value of local forecasts for outbreak preparedness and response.

13

Polygenic risk scores associate with asthma phenotypes and proteomic analyses implicate IL1R1 in two family-based studies

Lee, S.; Moll, M.; Mendez, K.; Prince, N.; Lasky-Su, J.; Lutz, S. M.; Weiss, S. T.; Lange, C.; Kelly, R. S.; Hecker, J.

2026-06-11 genetic and genomic medicine 10.64898/2026.06.06.26355045 medRxiv

Top 1%

0.2%

Show abstract

Despite its high prevalence and the discovery of hundreds of genetic associations, the genetic determinants and heterogeneous manifestations of asthma remain incompletely understood. Incorporating polygenic risk scores (PRS) into asthma research offers a powerful approach to quantify inherited susceptibility, refine risk profiles, and advance mechanistic understanding of disease development. For this study, we leveraged whole-genome sequencing (WGS) data from two family-based cohorts of childhood asthma - the Genetics of Asthma in Costa Rica Study (GACRS) and the Childhood Asthma Management Program (CAMP) - to examine the transmission profiles of externally derived asthma PRS and their associations with clinical phenotypes in children with asthma. To further elucidate molecular mechanisms, we integrated large-scale external genome-wide association study (GWAS) summary statistics and genetic prediction models of protein abundance in a two-step proteome-wide association study (PWAS) of asthma. Our findings provide robust evidence supporting the validity of externally derived asthma PRS (asthma PRS association p-value p={10}^{-24} [GACRS and CAMP trios combined] for the Global Biobank Meta-analysis Initiative [GBMI]) and reveal consistent associations with spirometry measures and atopy markers across both studies, as 13 of 21 traits (62%) were significantly associated with the GBMI-PRS in the meta-analysis after multiple-testing correction. Moreover, the results of the integrative proteomic analysis implicate IL-1 signaling in the etiology of asthma, reinforcing the candidacy of IL1R1 antagonists for drug repurposing.

14

Distinct and shared genetics of kidney filtration function versus albuminuria revealed by multi-trait GWAS

de Hesselle, H. C.; Garben, B.-F.; Stark, K. J.; Warth, R.; Teumer, A.; Pattaro, C.; Heid, I. M.; Winkler, T. W.

2026-06-09 genetic and genomic medicine 10.64898/2026.06.08.26355141 medRxiv

Top 2%

0.1%

Show abstract

Chronic kidney disease is characterized by decreased glomerular filtration rate (eGFR, estimated from serum creatinine or cystatin C) or increased urinary albumin-to-creatinine-ratio (UACR). Genome-wide association studies provided the genetic make-up of these traits, but their overlap remained largely unknown. Our multi-trait GWAS (N=1M) identified 812 signals and multi-trait fine-mapping sharpened the identification of likely causal variants. Of 333 signals classified for filtration function or albuminuria, only 11 overlapped. Their effects on eGFR and UACR were directionally concordant, dominated by eGFR and independent of HbA1c or mean arterial pressure. Mapped genes pinpointed mechanisms related to glomerular filtration area (SHROOM3, EPB41L5) and sodium-mediated intraglomerular pressure (NRBP1, DPEP1/CHMP1A). Genetics of fluid intake resulted in shadow effects on UACR without albumin leakage into urine. Our multi-trait approach sharpened the identification of likely causal genes for kidney traits, demonstrated largely distinct genetics for filtration function versus albuminuria, and provided new biological insights into the overlap.

15

A risk-of-contagion index using a Bayesian based model for the COVID-19 epidemic in Mexico

Corona-Moreno, R.; Acuna-Zegarra, M. A.; Santana-Cibrian, M.; Velasco-Hernandez, J. X.

2026-06-10 health policy 10.64898/2026.06.09.26355274 medRxiv

Top 2%

0.1%

Show abstract

During the COVID-19 pandemic, limited testing capacity and reporting delays complicated epidemic surveillance and decision-making in Mexico. We calibrated \textit{covidestim}, a Bayesian nowcasting model, to estimate the total SARS-CoV-2 infections from reported cases and deaths using Mexican surveillance data. Disease-progression distribution priors were calibrated using Mexico City records and validated through comparisons with national seroprevalence surveys, hospitalization data, and annual reported severe-case rates across all states. Using the reconstructed estimates of active infections, we implemented an event-based risk framework that quantifies the probability of encountering at least one infectious individual in gatherings of different sizes. This probability was subsequently translated into a four-level epidemiological traffic-light indicator and computed at both state and municipality levels. The resulting estimates revealed substantial spatial heterogeneity that is obscured by state-level aggregation, particularly in states with marked differences between urban and rural municipalities. To evaluate consistency with public-health indicators, we compared the proposed risk classification with the official Mexican epidemiological traffic-light system, considering interpretable gathering sizes relevant to public-health decision making. Weekly reports derived from this framework were delivered to policymakers in the State of Queretaro in Mexico, as an anticipation tool for school reopening and public-space management. This demonstrates that this Bayesian reconstruction of infections combined with event-based risk metrics can provide an interpretable and generalizable municipality-level complement to routine surveillance systems, particularly in regions with limited testing capacity and heterogeneous local transmission dynamics.

16

Assessment of occupational aerosol exposure for laboratory technicians: A quantitative study using {Phi}X174 phage as a substitute virus

Liu, B.; Liu, D.; Zhang, H.

2026-06-11 occupational and environmental health 10.64898/2026.06.09.26355304 medRxiv

Top 2%

0.1%

Show abstract

This study aimed to clarify aerosol exposure risks throughout the workflow of a Biosafety Level 2 (BSL-2) polymerase chain reaction (PCR) laboratory, validate the suitability of the {Phi}X174 bacteriophage as an indicator virus, and provide evidence for biosafety control measures. The {Phi}X174 bacteriophage was used to simulate viral samples, and a concentration-bacteriophage plaque standard curve was constructed (R2=0.998). Five operational steps in a simulated PCR laboratory were quantitatively monitored for aerosol concentration using double-layer agar plates, with blank controls used to eliminate interference. Statistical analysis was employed to identify risk differences. Sample homogenization ((5.67 {+/-} 1.23) x 104 plaque-forming units (PFU)/m3) and nucleic acid extraction ((3.45 {+/-} 0.89) x 104 PFU/m3) were identified as high-/very high-risk steps. The viral load in the samples was strongly positively correlated with the aerosol concentration (r = 0.926, P <0.001), with aerosol levels linearly decreasing with increasing distance in high-risk steps. The {Phi}X174 bacteriophage demonstrated high detection sensitivity (101 PFU/ml) and demonstrated safety compatibility with BSL-2 laboratories. Aerosol risks in PCR laboratories exhibit step-specific differentiation, and {Phi}X174 serves as an ideal indicator virus. Proposed strategies such as equipment upgrades and personal protective equipment (PPE) grading can reduce exposure risks.

17

Integrated cardiometabolic and nutritional risk profiling identifies pregnancy loss as a marker of systemic metabolic vulnerability

Agarwal, T.; Namburu, J. R.; Kachroo, P.

2026-06-08 epidemiology 10.64898/2026.06.04.26354910 medRxiv

Top 2%

0.1%

Show abstract

Background: Pregnancy loss has important implications for womens health. Although maternal age is a well-established risk factor, the contribution of routinely measured cardiometabolic and behavioral markers at population-scale remains incompletely characterized. Objective: To examine associations between cardiometabolic, nutritional, and behavioral risk markers and pregnancy loss among U.S. women of reproductive age. Methods: We conducted a cross-sectional analysis of 4,842 U.S. women aged 20-44 years with [≥]1 pregnancy using the National Health and Nutrition Examination Survey data (2013-2023). Pregnancy loss was defined as [≥]1 prior miscarriages. Exposures included body mass index, smoking exposure (cotinine), lipid biomarkers, vitamin D and folate, and a composite cardiometabolic-nutritional risk score. Survey-weighted logistic regression estimated adjusted odds ratios (aORs) and 95% confidence intervals, with bootstrap resampling for predictor robustness. Results: The weighted prevalence of pregnancy loss was 23%. Higher odds of pregnancy loss were associated with increasing age (aOR per year=1.02; 95% CI: 1.00-1.04), Non-Hispanic Black race (aOR=1.32; 95% CI: 1.00-1.74), overweight (aOR=1.56; 95% CI: 1.16-2.11), obesity (aOR=2.06; 95% CI: 1.39-3.05), and smoking (aOR=1.58; 95% CI: 1.19-2.10). Adverse lipid profiles, particularly elevated triglycerides (aOR=1.83; 95% CI: 1.16-2.90) and high low-density lipoprotein (aOR=2.97; 95% CI: 1.45-6.61), were independently associated with pregnancy loss. Vitamin D/folate were not stable predictors. Higher composite cardiometabolic-nutritional risk scores were observed among women with pregnancy loss (P=0.026). Conclusion: Pregnancy loss clustered with adverse cardiometabolic and behavioral risk markers in a nationally representative population. These findings highlight pregnancy loss as a marker of broader metabolic vulnerability supporting the need for longitudinal studies and cardiometabolic profiling to inform preconception care and risk stratification.

18

Global population frequencies of NAT2 star alleles observed in three large biobanks

Sangkuhl, K.; Whirl-Carrillo, M.; Woon, M.; Venkatesh, R.; Keat, K.; Whaley, R.; Ritchie, M. D.; Klein, T. E.

2026-06-11 genetic and genomic medicine 10.64898/2026.06.09.26355281 medRxiv

Top 2%

0.1%

Show abstract

NAT2 is an important pharmacogene which encodes the N-acetyltransferase 2 enzyme that is involved in the metabolism of multiple medications, and variants in this gene can affect patient response to these medications. CPIC has published a clinical guideline for prescribing hydralazine using NAT2 genotypes. Just prior to the guideline, updated NAT2 star allele numbering and definitions were released, differing somewhat from the historical nomenclature. Clinical pharmacogenomic testing panels often test for the most common star alleles, so knowledge of the most common updated NAT2 star alleles is critical for the implementation of the CPIC NAT2/hydralazine guideline. We first determine NAT2 diplotype frequencies from UK Biobank (UKBB) 200k phased genomes, then analyzed allele, diplotype, and phenotype population frequencies from the All of Us Research program, PennMedicine BioBank (PMBB) and UKBB 500k datasets. We found that analyzing NAT2 diplotypes from phased data provides critical information for algorithms designed to predict diplotypes from unphased data. We observed that NAT2*5, *6, and *4 were the most common star alleles in that order, and the top 11 most frequent NAT2 star alleles were the same across all biobanks. However, differences in star allele frequencies across biogeographical populations were observed. The largest difference led to a higher frequency of NAT2 poor metabolizer phenotypes as compared to rapid and intermediate metabolizer phenotypes in all global populations except in the EAS population, where NAT2 poor metabolizers were in the minority.

19

Human genetic evidence links serine biosynthesis to diabetic peripheral neuropathy

Fridman, V.; Kakar, A.; Jensen, A.; Van de Vondel, L.; Wheeler, A.; Phillips, L. S.; Zhou, J.; Zuchner, S.; Reusch, J.; Raghavan, S.

2026-06-10 genetic and genomic medicine 10.64898/2026.06.09.26355286 medRxiv

Top 2%

0.1%

Show abstract

Diabetic peripheral neuropathy (DPN) is a common and disabling condition for which no disease-modifying therapies are available. Glycemic and metabolic drivers do not fully explain why only a subset of individuals with diabetes develop DPN, and genetic contributors remain poorly defined. We aimed to perform a multi-population genome-wide association study (GWAS) of DPN to highlight potential new etiological pathways and therapeutic targets. Methods We performed a multi-population GWAS of neuropathy in people with and without diabetes using the VA Million Veteran Program and UK Biobank, followed by replication in the All of Us Research Program (AoU), and gene-based and gene-set analyses to identify implicated pathways. Causal relationships between circulating serine levels and DPN were further tested using two sample Mendelian randomization. To further evaluate pathogenic potential, we analyzed rare, high impact variants in GWAS implicated genes among individuals with unresolved inherited neuropathies using the GENESIS platform. Findings Among individuals with type 2 diabetes, we identified seven genome wide significant loci (p<5x10-): PHGDH and PSPH (key serine synthesis genes), TEAD1, CYP4F11, LARGE1, FTO, and COBLL1. No loci were significant in individuals without diabetes or with type 1 diabetes. Four loci (PHGDH, TEAD1, FTO and CYP4F11) replicated in AoU (p <0.05). Mendelian randomization demonstrated that higher genetically predicted serine levels were associated with lower DPN risk, consistent with a causal role of serine metabolism in disease pathogenesis. Rare-variant burden analyses revealed associations of predicted deleterious variants with inherited neuropathy case status in PHGDH (odds ratio [OR] 12.7 [95% CI 7.9, 20.4]), PSPH (OR 8.5 [7.2, 10.2]), PHKG1 (OR 4.8 [3.7, 6.3]), and LARGE1 (OR 0.007 [0.0004, 0.1]). Interpretation Convergent genetic evidence across common and rare variation implicates serine synthesis as a key pathway in DPN. These findings link diabetic and inherited neuropathies through a shared metabolic mechanism, identifying serine metabolism as a potential therapeutic target.

20

STELLAR: A flexible ensemble learning framework integrating rare variants to enhance polygenic risk prediction

Chen, T.; Li, X.; Mazumder, R.; Zhang, H.; Lin, X.

2026-06-09 genetic and genomic medicine 10.64898/2026.06.07.26355109 medRxiv

Top 2%

0.1%

Show abstract

Whole-exome and whole-genome sequencing technology has enabled the discovery of rare genetic variants associated with human health and diseases. However, existing statistical methods used for rare variant association testing are not well-suited for building genetic risk prediction models that jointly incorporate rare and common variants. We propose STELLAR, a flexible ensemble learning-based approach to compute rare variant polygenic risk scores (PRS) using association summary statistics to enhance conventional common variant PRS. Our method combines burden-based and penalty-based rare variant analysis and leverages functional annotation information to prioritize potentially causal variants within the prediction models. In simulation studies, PRS using STELLAR consistently showed the highest prediction accuracy compared to models using common variants alone or rare variant burdens. Applied to UK Biobank whole-exome sequencing data (n=310,831) across eight continuous and five binary traits, STELLAR significantly improved prediction accuracy, refined stratification of individuals at the highest genetic risk beyond common variants, and prioritized biologically relevant genes. STELLAR provides a scalable strategy to incorporate rare variants into PRS in addition to common variants, advancing precision risk prediction and enabling more comprehensive assessment of genetic contributions to complex diseases.